[1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
Version control:
Git
Github
Consistent file structure and naming (e.g., 0-dataMunging)
Clear and readable plots
Clear and properly commented code
The data was aquired through Spotify API in 2020 by TidyTuesday
The class of the data frame
[1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
[1] 32833
[1] "track_id" "track_name"
[3] "track_artist" "track_popularity"
[5] "track_album_id" "track_album_name"
[7] "track_album_release_date" "playlist_name"
[9] "playlist_id" "playlist_genre"
[11] "playlist_subgenre" "danceability"
[13] "energy" "key"
[15] "loudness" "mode"
[17] "speechiness" "acousticness"
[19] "instrumentalness" "liveness"
[21] "valence" "tempo"
[23] "duration_ms"
edm latin pop r&b rap rock
6043 5155 5507 5431 5746 4951
Check for the NA observation
# A tibble: 5 × 4
track_name track_artist track_album_name track_id
<chr> <chr> <chr> <chr>
1 <NA> <NA> <NA> 69gRFGOWY9OMpFJgFol1u0
2 <NA> <NA> <NA> 5cjecvX0CmC9gK0Laf5EMQ
3 <NA> <NA> <NA> 5TTzhRSWQS4Yu8xTgAuq6D
4 <NA> <NA> <NA> 3VKFip3OdAvv4OfNTgFWeQ
5 <NA> <NA> <NA> 69gRFGOWY9OMpFJgFol1u0
After investigating, these are unique songs
Songs that have the same name
[1] 9383
Songs that have the same ID
[1] 4477
edm latin pop r&b rap rock
4877 4137 5132 4504 5401 4305
How features change within genres
Latin stands out in danceability and valence.
danceability
What’s the correlation between energy, loudness and acousticness?
Positive correlation between energy and loudness
Negative correlation between energy and acousticness
Does the track duration affect the popularity of the song?
The analysis of the Spotify dataset yielded the following results:
Pop and Latin are the top most popular genres.
The higher the danceability/ valence, the more positively it correlates to the popularity.
Energy and loudness are positively correlated.
Energy and acoustics are negativelly correlated.
Track duration does not have a clear effect on the song popularity.